A Methodology for Improving the Performance of Non-Ranker Feature Selection Filters

نویسندگان

  • Lior Rokach
  • Barak Chizi
  • Oded Maimon
چکیده

Feature selection is the process of identifying relevant features in the dataset and discarding everything else as irrelevant and redundant. Since feature selection reduces the dimensionality of the data, it enables the learning algorithms to operate more effectively and rapidly. In some cases, classification performance can be improved; in other instances, the obtained classifier is more compact and can be easily interpreted. There is much work done on feature selection methods for creating ensemble of classifiers. Thus, these works examine how feature selection can help ensemble of classifiers to gain diversity. This paper examines a different direction, i.e. whether ensemble methodology can be used for improving feature selection performance. In this paper we present a general framework for creating several feature subsets and then combine them into a single subset. Theoretical and empirical results presented in this paper validate the hypothesis that this approach can help finding a better feature subset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)

Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...

متن کامل

A New Hybrid Method for Improving the Performance of Myocardial Infarction Prediction

Abstract Introduction: Myocardial Infarction, also known as heart attack, normally occurs due to such causes as smoking, family history, diabetes, and so on. It is recognized as one of the leading causes of death in the world. Therefore, the present study aimed to evaluate the performance of classification models in order to predict Myocardial Infarction, using a feature selection method tha...

متن کامل

Expected Divergence Based Feature Selection for Learning to Rank

(i) RankSVM SVM based pairwise ranker. (ii) RankBoost Weak ranker based pairwise ranker that uses boosting. (iii) LambdaMART LambdaMART uses gradient boosting to optimize a ranking cost function. Baseline 1: FS-BFS The FS-BFS is a wrapper based approach of feature selection for ranking [Dang and Croft, 2010]. The method partitions the F into non-overlapping k subsets and learns a ranking model ...

متن کامل

A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...

متن کامل

Impact of Feature Selection Techniques for Tweet Sentiment Classification

Sentiment analysis of tweets is a powerful application of mining social media sites that can be used for a variety of social sensing tasks. Common feature engineering techniques frequently result in a large numbers of features being generated to represent tweets. Many of these features may degrade classifier performance and increasing computational cost. Feature selection techniques can be used...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJPRAI

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2007